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ABSTRACT 

The Orientations of Proteins in Membranes (OPM) 
database is a curated web resource that provides 
spatial positions of membrane-bound peptides and 
proteins of known three-dimensional structure in 
the lipid bilayer, together with their structural clas- 
sification, topology and intracellular localization. 
OPM currently contains more than 1200 transmem- 
brane and peripheral proteins and peptides from 
approximately 350 organisms that represent ap- 
proximately 3800 Protein Data Bank entries. 
Proteins are classified into classes, superfamilies 
and families and assigned to 21 distinct membrane 
types. Spatial positions of proteins with respect to 
the lipid bilayer are optimized by the PPM 2.0 
method that accounts for the hydrophobic, hydrogen 
bonding and electrostatic interactions of the pro- 
teins with the anisotropic water-lipid environment 
described by the dielectric constant and hydrogen- 
bonding profiles. The OPM database is freely ac- 
cessible at http://opm.phar.umich.edu. Data can be 
sorted, searched or retrieved using the hierarchical 
classification, source organism, localization in dif- 
ferent types of membranes. The database offers 
downloadable coordinates of proteins and peptides 
with membrane boundaries. A gallery of protein 
images and several visualization tools are provided. 
The database is supplemented by the PPM server 
(http://opm.phar.umich.edu/server.php) which can 
be used for calculating spatial positions in mem- 
branes of newly determined proteins structures or 
theoretical models. 

INTRODUCTION 

More than half of all proteins in cells associate with 
biological membranes permanently or temporarily. 



This includes integral monotopic and transmembrane 
(TM) proteins, which are encoded by 20-30% of 
sequenced genomes (1), and more numerous peripheral 
proteins and peptides that can form transient complexes 
with membrane lipids or proteins. Recent progress in 
structure determination techniques (2) have led to a sig- 
nificant growth of the number of membrane proteins with 
known three-dimensional (3D) structures. Currently, there 
are approximately 1200 and 10000 entries in the Protein 
Data Bank (PDB) (3) related to TM and peripheral 
proteins, respectively, which corresponds to 1.6 and 13% 
of the PDB content. Many PDB entries represent different 
complexes, conformations, mutants or crystal forms of the 
same protein, so the set of distinct proteins is approxi- 
mately 3-fold smaller. 

Integral membrane proteins with known 3D structures 
can be found in several specialized databases, such as 
Stephen White's list (4), the Membrane Proteins Data 
Bank (MPDB) (5) and the transporter classification 
database (TCDB) (6). These resources provide some com- 
plementary information, including bibliography, crystal- 
lization and solubilization conditions (5) or classification 
and phylogenetic analysis of membrane transporters (6). 
More specialized resources cover membrane-targeting 
domains [MeTaDoR (7)], and antimicrobial peptides 
with non-standard amino acids [Peptaibol (8)]. 

The critical information missing in these databases is 
the exact position of membrane boundaries, which is not 
obvious from the protein 3D structure, even if the protein 
was crystallized with phospholipids. Spatial positions of 
membrane-associated proteins with respect to the lipid bi- 
layer can be determined by experimental techniques or 
computationally. Experimental methods, including chem- 
ical modification, spin-labeling, X-ray scattering, neutron 
diffraction, infrared spectroscopy, electron-cryomicroscopy 
and NMR, are very laborious and, therefore, have 
been applied for a limited set of proteins and peptides 
(9,10). On the other hand, development of a fast and 
reliable computational approach would allow positioning 
of proteins in membranes in a timely manner, following 
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the expanding flow of experimentally determined 
structures. 

Several theoretical methods have been applied for pos- 
itioning of proteins in membranes, which are based on 
molecular dynamics (MD) simulations (11), coarse- 
grained MD (12), optimization of electrostatic energy 
(13-16), energy minimization with implicit solvent models 
(17-21) or membrane depth-dependent scoring functions 
(22-24). The results for TM proteins have been collected 
in two databases, Protein Data Bank of TransMembrane 
proteins (PDBTM) (25), and Coarse-Grained DataBase 
(CGDB) (12,26). PDBTM includes an up-to-date set of 
1441 PDB structures (release as of 24 June 2011) of 
a-helical and p-barrel proteins arranged in the lipid bilayer 
by the TMDET algorithm (23). CGDB holds pseudo- 
atoms models of approximately 370 TM proteins and 
around a dozen of monotopic proteins generated by the 
Coarse Grained MD simulations (12). Both databases are 
focused on integral membrane proteins but do not include 
peripheral proteins, because the prediction of their weak 
association with lipid bilayers would require a significantly 
higher precision in calculation of membrane binding 
affinities than can be provided by the underlying methods. 

To fill this gap, we have proposed and recently advanced 
a method for Positioning of Proteins in Membrane (PPM 
2.0) by optimizing free energy of protein transfer from 
water to the membrane environment that implements an 
anisotropic solvent model of the lipid bilayer (9,27). The 
method was thoroughly verified for several dozens of TM 
and peripheral proteins and peptides, whose arrangements 
in membranes have been experimentally studied (9,10,27). 
High computational efficiency of PPM 2.0 allows its ap- 
plication for the large-scale analysis of proteins from the 
PDB. The results are deposited in our Orientation of 
Proteins in Membranes (OPM) database that includes both 
TM and peripheral proteins (28). Hence, it covers a sig- 
nificantly larger number of membrane-associated macro- 
molecules (1255 proteins and peptides) and PDB entries 
(3766 structures) than PDBTM and CGDB. It also pro- 
vides a four-level protein classification system together 
with information about protein topology, type of intracel- 
lular membrane, source organism and comparison with 
experimental publications on arrangement of the corres- 
ponding proteins in membranes. 



DATABASE CONTENT 

The OPM database was established in December of 2005 
at the College of Pharmacy of the University of Michigan. 
The database currently holds 427 TM, 725 peripheral pro- 
teins and 103 membrane-active peptides related to 3766 
PDB entries (Figure 1). 

The database includes only protein structures whose 
spatial positions in membranes can be computationally 
predicted, rather than a complete set of all membrane- 
associated proteins from the PDB. The positions of 
many peripheral proteins in membranes cannot be cal- 
culated because their membrane-anchoring structures 
(amphiphilic helices or loops, lipidated residues or specif- 
ically bound lipids) are disordered or missing in the 
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Figure 1. Current distribution of different OPM entry types (as of 20 
July 2011). Numbers of representative structures are indicated, as well 
as numbers of related PDB entries (in parenthesis). 



experimental structure. In addition, peripheral proteins 
may adopt alternative conformations, some of which are 
not membrane associated. 

All data are organized in pages associated with every 
protein class, superfamily, family or an individual protein. 
To deal with significant redundancy of PDB data, we 
select a 'representative' PDB entry for each protein. This 
entry represents the most complete protein structure that 
includes maximal number of protein domains and fewer 
disordered segments. Several 'representative' structures of 
the same protein are selected if they correspond to distinct 
conformational states or alternative quaternary complexes 
of the protein. All other available PDB entries of the same 
protein are included as 'related' structures linked to the 
'representative' structure. 

A 'representative' protein page (Figure 2) displays pro- 
tein name, classification, subcellular localization (or des- 
tination membrane), source organism, protein topology 
(membrane side associated with protein N-terminus), 
number of subunits and links to other web resources. 
Another set of data describes the arrangement of a 
protein in the lipid bilayer as calculated by PPM 2.0. It 
includes: (i) downloadable atomic coordinates of a protein 
with lipid bilayer boundaries (located at the level of lipid 
carbonyls) that are indicated by dummy atoms; (ii) orien- 
tational parameters (tilt angle, hydrophobic thickness or 
membrane penetration depth); (hi) membrane binding 
energies; and (iv) list of TM segments. 

Data visualization is provided by static images and 
dynamic images generated by freely available interactive 
tools. Oligomeric states are taken from the PDBe (29) or 
generated by PISA (30), excluding a number of cases in 
which biological units were chosen in accordance with 
publications. For example, secretory phospolipases A2 
and cytochromes P450 were taken in the physologically 
relevant monomeric state, even though PISA identifies 
some of them as stable dimers. Topology and intracellular 
localization of proteins were usually taken from the cor- 
responding publications on protein structure determin- 
ation, though for some peripheral proteins topology 
data from UniProt (31) were used and compared for hom- 
ologous proteins in the database to minimize potential 
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Protein Classification 



Types (4 types) 
Classes (11 classes) 



Superf arnilies (247 
superfamilies) 



Families (359 families) 
Species (357 species) 



Localization (23 

types) 



All proteins in OPM 
(1253 proteins) 



Protein Links 



PDB Sum, PDB, OCA, 
MPKS, PDBTM, MPDB, 
CGDB 



lrlw » Cytosolic phospholipase A2, group IVA 



■ Type: 2. Monotopic/peripheral (4 classes) 

■ Class: 2.2. All beta monotopic/peripheral (41 
superfamilies) 

■ Super family: 2.2.11. C2 domain (1 family) 

■ Family: 2.2.11.01. C2 domain (32 proteins) 

■ Species: Homo sapiens (253 proteins) 

■ Localization: Eukaryotic plasma membrane (303 
proteins) 



lrlw » Cytosolic phospholipase A2, group IVA 


Depth 


6.4 ± 1.9 A" 


Tilt Angle 


57 ± 17° 


aGtrcnsfer 


•7.8 kcal/mol 


Links to lrlw 


PDB Sum t§>, PDB <3, SCOP ®, 
MSD [§>, OCA f£>, MMDB [5> 


Topology 


cytoplasmic 


Resolution 


2.40 a 


Other PDB entries ot 
this protein 


lbci, lcjy 


Number of subunits 


1 




Download Coordinates 



Topology in Eukaryotic plasma membrane 
p extracellular side 



EKperimental Verification for lrlw » Cytosolic phospholipase A2 r group IVA 

Side-chains of residues A34, F35, G36, M38, L39, Y96, V97 and M98 penetrate into the hydrophobic core of 
the membrane in the calculated orientation (Y96 is in the interfacial region), Membrane depth parameters of 
spin-labeled cysteines were positive only in these positions (Frazier et al, 2002, Malmberg et al, 2003), 
cosistently with mutagenenesis/binding studies of Bittova et al. (1999). Depth parameters were close to zero 
for residues K32 and N95, which are located just beyond the calculated hydrophobic boundaries. Two calcium 
ions are located at the distances of 5.8 and 7.2 A beyond the hydrophobic boundary, i.e. 0.8 and 2.2 A, 
respectively, beyond the level of bulk lipid phosphates, This is consistent with positioning of these ions 
approximately at the level of lipid phosphates in X-ray reflectivity and EPR studies (Malkova et al. 2005, 
Malmberg and Falke 2005). Maximal membrane binding affinity is about -11 kcal/mol (Bittova et al. 1999, 
Stahelin and Cho 2001) or -6,4 kcal/mol (Nalefski et al. 2001). 
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Comments on lrlw » Cytosolic phospholipase A2, group IVA 



Orientation of this C2 domain changes little when it associates with catalytic domain (see lcjy). 
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Figure 2. Example of entry page for C2 domain of peripheral protein phospholipase A2. 
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errors. Annotation and experimental verification of the 
calculated arrangement in membrane (with PubMed 
links) is provided for some well-studied proteins. 

A 'related' protein page provides downloadable atomic 
coordinates of a protein with lipid bilayer boundaries pre- 
sented by dummy atoms, protein static and dynamic 
images, and links to related web resources. 

PROTEIN CLASSIFICATION 

The classification has four-level hierarchy: type (TM, per- 
ipheral/monotopic protein and peptides), class (a-helical 
polytopic, a-helical bitopic, P-barrel TM proteins; and 
all-a, all-P, a+p, a/p peripheral/monotopic proteins), 
superfamily (evolutionarily related proteins) and family 
(proteins with clear sequence homology). Multi-domain 
proteins and their complexes are classified based on 
Pfam (32), SCOP (33) and TCDB (6) classification of 
their largest membrane-associated domain. OPM super- 
families usually correspond to Pfam clans and SCOP 
superfamilies, whereas OPM families correspond to 
Pfam, SCOP and TCDB families. 



POSITIONING OF PROTEINS IN MEMBRANES 

The spatial positions of proteins in membranes are cal- 
culated by the advanced version of our method, PPM 
2.0, which combines all atom representation of a solute, 
an anisotropic solvent representation of the lipid bilayer 
and universal solvation model (27). This is a general 
physical method, which does not require a parameter ad- 
justment for different classes of molecules. The anisotropic 
properties of the lipid bilayer are described by transbilayer 
profiles of dielectric constant and hydrogen bonding 
acidity and basicity parameters. We use polarity profiles of 
l,2-dioleoyl-OT-glycero-3-phosphocholine (DOPC) bilayer 
derived from experimental distributions of quasi- 
molecular segments of lipids determined by neutron and 
X-ray scattering (34), and transbilayer distribution of 
water in DOPC bilayer determined in spin-labeling experi- 
ments (35). The location of a protein in the membrane 
coordinate system is obtained by optimization of protein 
transfer energy from water to the lipid bilayer (AG transf ). 
The transfer energy is calculated as a sum of two terms: (i) 
a solvent accessible surface area-dependent term that 
accounts for van der Waals and H-bonding solvent- 
solute interactions and entropy of solvent molecules in 
the first solvation shell; and (ii) an electrostatic term that 
includes solvation energy of dipoles and ions, and deion- 
ization penalty of ionizable groups in non-polar environ- 
ment. The method also accounts for the preferential 
solvation by water of protein groups and for the hydro- 
phobic mismatch for TM proteins. 

The PPM 2.0 method automatically discriminates TM 
and peripheral/monotopic proteins based on their mem- 
brane penetration depth, transfer energy (AG trans f) and 
the detection of only one or two membrane boundary 
planes. For integral membrane proteins and peptides 
AGt ra „ s f is usually between —400 and — lOkcal/mol. For 
peripheral protein the calculated AG trailsf varies between 



— 15 and — 1.5kcal/mol. Proteins with marginal AG trans f 
values (between —1.5 and — 5kcal/mol) are in the gray 
zone and their potential membrane binding sites should 
be treated with caution because some of them might rep- 
resent hydrophobic spots involved in protein-protein 
interactions. To distinguish membrane-bound proteins, 
additional criteria are needed: (i) similar membrane- 
binding modes are found for proteins from the same 
superfamily; (ii) calculated membrane boundaries are spa- 
tially close to potential binding sites for lipids or other 
hydrophobic ligands, to lipidated residues or to TM 
helices that are missing in the crystal structure; and 
(iii) some experimental indications of protein-membrane 
interaction are found in the literature for this or a closely 
homologous protein. Proteins from the gray zone, which 
do not satisfy at least two of these additional criteria 
cannot be reliably positioned in membranes and, therefore 
are not included in OPM. For the same reason some struc- 
tures of short-protein fragments that miss membrane- 
anchoring elements, C a -atom models, some NMR 
models with poorly defined disordered loops, and theor- 
etical models are not included in the database. 

The accuracy of PPM predictions was thoroughly tested 
for a large set of TM and peripheral proteins, peptides and 
small molecules whose membrane penetration depths, 
orientations with respect to the lipid bilayer or membrane 
binding affinities have been experimentally studied 
(9,10,27). The method was always able of reproducing 
the sets of residues penetrating to the lipid bilayer accord- 
ing to spin-labeling, fluorescence and chemical modifica- 
tion studies. The accuracy of determination of membrane- 
binding energy, which was assessed as RMSE between 
experimental and calculated values, was found to be 
0.74kcal/mol for small molecules and 1.13kcal/mol for 
peripheral proteins (27). However, proteins are highly 
dynamic, rather than occupying a fixed spatial position 
in the membrane. To evaluate the uncertainty in the pro- 
tein orientation, we calculated fluctuations of tilt angle, 
membrane penetration depth and hydrophobic thickness 
within 1 kcal/mol around the global minimum of energy 
for every protein structure. The values of the fluctuations 
are provided in OPM. The uncertainties in spatial pos- 
itions can also be estimated from the comparison of dif- 
ferent structures of the same protein. They are relatively 
small for TM proteins (1 A for the hydrophobic thickness 
and approximately 5°C for the tilt angle), but larger for 
peripheral proteins, especially for NMR models with 
poorly defined conformations of membrane-interacting 
loops, where the uncertainty in tilt angle may reach 50°. 
Large differences in orientations may be observed for al- 
ternative conformations of proteins. For example, distinct 
conformations of Ca 2+ -ATPase, a TM a-helical protein, 
differ in protein tilt by 17° (PDB IDs: lsu4, 3b8c) and 
membrane thickness by 3 A (PDB IDs: 2zbd, 3ar8), 
which may be of functional importance. 



DATABASE ACCESS 

Access to the OPM database is through the web site at 
http://opm.phar.umich.edu/. Pages are dynamically 
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generated for every level of hierarchical classification 
including superfamilies, families and individual protein 
pages. The 'representative' protein pages can be accessed 
from any higher hierarchy page or using database search 
by PDB code or protein name, while the 'related' protein 
pages can be accessed through internal links from the 'rep- 
resentative' protein pages or using search by PDB code. 

To facilitate data retrieval and analysis, higher hier- 
archy pages are organized in protein lists and tables sup- 
plemented by protein images, internal and external links. 
For example, to compare membrane interaction modes of 
evolutionarily related proteins from the database, one can 
navigate to a protein superfamily page, which simultan- 
eously displays images of all proteins from the superfamily 
with calculated membrane boundaries. Tables are auto- 
matically generated for every protein type, class, super- 
family, family, membrane localization and source 
organism. Tables allow sorting of proteins based on 
the content of different fields: protein family code, 
protein name, PDB ID, biological source, destination 
membrane, number of TM a-helices or (3-strands, 
number of subunits, transfer energy and orientational par- 
ameters of proteins in membranes. 

All coordinate files of protein structures with hydrocar- 
bon core boundaries marked by dummy atoms can be 
downloaded individually for each protein or as a single 
file for various protein sets: oc-helical polytopic proteins, 
oc-helical bitopic proteins, P-barrel proteins, monotopic and 
peripheral proteins and peptides. Lists of PDB codes for 
every protein family, superfamily, class and type are auto- 
matically generated at the beginning of every table for the 
corresponding protein set. Semiannual updated releases of 
the database will be provided as downloadable sql files. 

Visualization is provided by static images and dynamic 
visualization tools. Static molecular images in PNG 
format are automatically generated using scripts for 
PyMOL molecular graphic software (36). Proteins with 
calculated membrane boundaries can be interactively dis- 
played in Chime, Jmol (37) or WebMol (38), which allows 
the orientation from both membrane sides and packing 
through the membrane to be readily visible. The whole 
gallery of protein images can be retrieved separately. 
The database provides links to TCDB (6), Pfam (32) 
from family and superfamily pages and to SCOP (33), 
PDB (3), PDBsum (39), PDBe (29), OCA (40), MMDB 
(41) from protein pages. Links to CGDB (12), MPKS (4), 
PDBTM (25) and MPDB (5) are also provided. Links to the 
OPM database are currently integrated in several widely 
used resources including PDBsum, OCA, Wikipedia, 
Membrane Builder (42), and Cell Microcismos 
Membrane Editor (43). 



MAI NTAI NANCE AND UPDATES 

OPM was developed with PHP, MySQL and the Smarty 
engine, which separates the program logic (PHP, MySQL) 
and presentation (XHTML, CSS, JavaScript), and enables 
caching. The database is populated by experimental struc- 
tures of proteins and their complexes extracted from the 
PDB. Some of the structures were modified using PPM 2.0 



to reconstruct missing side chain atoms and optimize side 
chain conformers at the membrane interface. The 
database curation includes selection of 'representative' 
PDB entries, identification of topology, localization and 
oligomeric state using available informatics resources, clas- 
sification of proteins to families and superfamilies and 
verification of the predicted arrangement in membrane, 
as described above. 

The OPM content is updated using queries and online 
forms, which we have developed. The data for TM 
proteins are normally updated on a biweekly basis. The 
newly released TM structures are regularly retrieved from 
the PDB by PDBTM, or by combined PDBe/Uniprot/ 
Interpro keyword search implemented in PDBe (29). 
Update of peripheral proteins is significantly more time- 
consuming and, therefore, is conducted on a yearly basis. 
To identify peripheral proteins, we perform an automatic 
screening of PDB entries using PPM 2.0 and selection 
criteria mentioned above, which is followed by the auto- 
matic comparison with lists of proteins that are indicated 
as membrane-associated by Pfam, PDBe, Uniprot or 
InterPro databases, the manual analysis of the results 
and examination of related publications. 



PPM SERVER 

To provide a web tool for calculation of spatial positions 
of proteins in the lipid bilayer we designed a PPM server 
that implements our PPM 2.0 method. The PPM server 
can be used for positioning in membranes of newly 
determined experimental structures or theoretical models 
of TM, peripheral proteins or peptides prior to their de- 
position in the PDB. The majority of TM proteins (1326 
entries) and a large part of peripheral membrane proteins 
(2230 entries) from the PDB has already been pre- 
calculated by our method and can be found in the OPM 
database. 

On the web interface of the PPM server the user can 
upload the atomic coordinates of a protein or a peptide, 
whose arrangement in the lipid bilayer will then be eva- 
luated by PPM 2.0. The protein structure should have a 
biologically relevant oligomeric state and all side-chain 
atoms that may interact with lipids. The user has an 
option to specify topology of the protein and include 
ligands (lipids, cofactors, etc.) in the calculation. 

The calculation of protein positions in the lipid bilayer 
may take from a few seconds to a few minutes, depending 
on the number of atoms. The output window displays 
orientational parameters: membrane penetration depth 
for peripheral proteins or hydrophobic thickness for TM 
proteins (A), tilt angle (°), and water-to-membrane 
transfer energy (kcal/mol). The fluctuations of depth/ 
hydrophobic thickness and tilt angle are defined within 
1 kcal/mol around the global minimum of transfer energy 
and indicated by ± values. The output also contains TM 
segments of integral proteins and a list of membrane- 
embedded residues for all proteins. The downloadable 
atomic coordinates of the protein together with positions 
of hydrophobic core boundaries marked by dummy atoms 
are provided. The interactive visualization of the protein 
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with calculated membranes borders is provided by Jmol 
(37). The server is hosted at the LAMP type (Linux, 
Apache, MySQL, Perl/PHP/Python) virtual server at the 
University of Michigan. 

Comparison of the PPM-server with other existing web 
servers for positioning of proteins in membranes, E z (22), 
TMDET (23), MAPS (24) and MAPAS (16), demon- 
strates that PPM clearly outperforms all of them in 
scope and accuracy and represents the only server that 
correctly predicts membrane-binding sites of peripheral 
proteins (see Supplementary Data). 

CONCLUSIONS 

The OPM database is the first comprehensive resource for 
membrane-associated peptides and proteins with known 
structures whose arrangement in membranes can be 
reliably assessed by the PPM 2.0 method, which is based 
on the evaluation of free energy of transfer of molecules 
from water to the anisotropic lipid environment. We also 
provide a web tool, PPM server, which enables the user to 
evaluate the membrane binding energy and parameters of 
spatial arrangement in the lipid bilayer of proteins not yet 
included into the OPM database. 

OPM is highly accessed with more than 435 000 unique 
visits since its first release (from 4000 to 10000 first time 
visitors and from 500 to 1200 returning visitors each 
month). The availability of the OPM database contributes 
to basic scientific research advances including understand- 
ing of the physics of protein-membrane interactions, 
determining the role of protein-lipid interactions in mo- 
lecular transport, signal transduction, membrane trans- 
formations, formation of multi-proteins functional units 
and comparative analysis of mechanisms of insertion and 
translocation of proteins from different families into or 
across membranes. We are dedicated to incorporating 
new data in a timely manner as long as funding support 
is available. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
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